Unified search string construction between albums and items. #6117

semohr · 2025-10-21T09:53:04Z

Description

Matching albums and singletons has slightly different behavior between singletons and albums.

Albums: When a user does not specify any search query in the manual prompt, the system defaults to using the corresponding value from the metadata.
Singletons: Overwrite metadata by any given noneempty search query.

Additionally I noticed that we have an unintentional type error in the metadata_plugins.candidates call. Since *arg and **kwargs are not typed they accepted artist and title/album of type str|None while they should only accept str.

Changes

This PR aligns both methods and uses a common function to parse the search terms. This intentionally couples these two occasions to hopefully prevent drift in the future and makes clear that we use the same logic in these two places.

Added explicit typehints to candidates and item_candidates function.

Copilot

Pull Request Overview

This PR standardizes search term handling between album and singleton matching in the autotag system. Previously, albums defaulted to metadata when no search query was provided, while singletons would overwrite metadata with any non-empty query. The changes introduce a unified _parse_search_terms_with_fallbacks function that applies consistent logic to both cases and adds explicit type hints to the candidates and item_candidates functions to catch type errors where str|None was accepted instead of the required str.

Key Changes:

Introduced _parse_search_terms_with_fallbacks function to unify search term parsing logic
Added explicit type hints to candidates and item_candidates functions to enforce str parameters
Updated documentation to clarify that empty strings are passed when no metadata is available

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
docs/changelog.rst	Documents the standardization of search parameter handling
beets/metadata_plugins.py	Adds explicit type hints to `candidates` and `item_candidates` functions and updates documentation
beets/autotag/match.py	Introduces `_parse_search_terms_with_fallbacks` and refactors both `tag_album` and `tag_item` to use unified search term parsing

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

beets/autotag/match.py

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Ensure that custom metadata plugins are updated to match the new named-argument signature for candidates and item_candidates to avoid breakage.
Consider renaming _parse_search_terms_with_fallbacks to a more concise name (e.g. _choose_search_terms) for better readability.
Clarify in the docstring how empty strings vs None values are treated by the fallback logic to avoid confusion.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Ensure that custom metadata plugins are updated to match the new named-argument signature for candidates and item_candidates to avoid breakage.
- Consider renaming _parse_search_terms_with_fallbacks to a more concise name (e.g. _choose_search_terms) for better readability.
- Clarify in the docstring how empty strings vs None values are treated by the fallback logic to avoid confusion.

## Individual Comments

### Comment 1
<location> `beets/autotag/match.py:252` </location>
<code_context>
def tag_album(
    items,
    search_artist: str | None = None,
    search_album: str | None = None,
    search_ids: list[str] = [],
) -> tuple[str, str, Proposal]:
    """Return a tuple of the current artist name, the current album
    name, and a `Proposal` containing `AlbumMatch` candidates.

    The artist and album are the most common values of these fields
    among `items`.

    The `AlbumMatch` objects are generated by searching the metadata
    backends. By default, the metadata of the items is used for the
    search. This can be customized by setting the parameters.
    `search_ids` is a list of metadata backend IDs: if specified,
    it will restrict the candidates to those IDs, ignoring
    `search_artist` and `search album`. The `mapping` field of the
    album has the matched `items` as keys.

    The recommendation is calculated from the match quality of the
    candidates.
    """
    # Get current metadata.
    likelies, consensus = get_most_common_tags(items)
    cur_artist: str = likelies["artist"]
    cur_album: str = likelies["album"]
    log.debug("Tagging {} - {}", cur_artist, cur_album)

    # The output result, keys are the MB album ID.
    candidates: dict[Any, AlbumMatch] = {}

    # Search by explicit ID.
    if search_ids:
        for search_id in search_ids:
            log.debug("Searching for album ID: {}", search_id)
            if info := metadata_plugins.album_for_id(search_id):
                _add_candidate(items, candidates, info)

    # Use existing metadata or text search.
    else:
        # Try search based on current ID.
        if info := match_by_id(items):
            _add_candidate(items, candidates, info)
            rec = _recommendation(list(candidates.values()))
            log.debug("Album ID match recommendation is {}", rec)
            if candidates and not config["import"]["timid"]:
                # If we have a very good MBID match, return immediately.
                # Otherwise, this match will compete against metadata-based
                # matches.
                if rec == Recommendation.strong:
                    log.debug("ID match.")
                    return (
                        cur_artist,
                        cur_album,
                        Proposal(list(candidates.values()), rec),
                    )

        # Manually provided search terms or fallbacks.
        _search_artist, _search_album = _parse_search_terms_with_fallbacks(
            (search_artist, cur_artist),
            (search_album, cur_album),
        )
        log.debug("Search terms: {} - {}", _search_artist, _search_album)

        # Is this album likely to be a "various artist" release?
        va_likely = (
            (not consensus["artist"])
            or (_search_artist.lower() in VA_ARTISTS)
            or any(item.comp for item in items)
        )
        log.debug("Album might be VA: {}", va_likely)

        # Get the results from the data sources.
        for matched_candidate in metadata_plugins.candidates(
            items, _search_artist, _search_album, va_likely
        ):
            _add_candidate(items, candidates, matched_candidate)

    log.debug("Evaluating {} candidates.", len(candidates))
    # Sort and get the recommendation.
    candidates_sorted = _sort_candidates(candidates.values())
    rec = _recommendation(candidates_sorted)
    return cur_artist, cur_album, Proposal(candidates_sorted, rec)

</code_context>

<issue_to_address>
**issue (code-quality):** Replace mutable default arguments with None ([`default-mutable-arg`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/default-mutable-arg/))
</issue_to_address>

### Comment 2
<location> `beets/autotag/match.py:340` </location>
<code_context>
def tag_item(
    item: Item,
    search_artist: str | None = None,
    search_title: str | None = None,
    search_ids: list[str] | None = None,
) -> Proposal:
    """Find metadata for a single track. Return a `Proposal` consisting
    of `TrackMatch` objects.

    `search_artist` and `search_title` may be used to override the item
    metadata in the search query. `search_ids` may be used for restricting the
    search to a list of metadata backend IDs.
    """
    # Holds candidates found so far: keys are MBIDs; values are
    # (distance, TrackInfo) pairs.
    candidates = {}
    rec: Recommendation | None = None

    # First, try matching by the external source ID.
    trackids = search_ids or [t for t in [item.mb_trackid] if t]
    if trackids:
        for trackid in trackids:
            log.debug("Searching for track ID: {}", trackid)
            if info := metadata_plugins.track_for_id(trackid):
                dist = track_distance(item, info, incl_artist=True)
                candidates[info.track_id] = hooks.TrackMatch(dist, info)
                # If this is a good match, then don't keep searching.
                rec = _recommendation(_sort_candidates(candidates.values()))
                if (
                    rec == Recommendation.strong
                    and not config["import"]["timid"]
                ):
                    log.debug("Track ID match.")
                    return Proposal(_sort_candidates(candidates.values()), rec)

    # If we're searching by ID, don't proceed.
    if search_ids:
        if candidates:
            assert rec is not None
            return Proposal(_sort_candidates(candidates.values()), rec)
        else:
            return Proposal([], Recommendation.none)

    # Manually provided search terms or fallbacks.
    _search_artist, _search_title = _parse_search_terms_with_fallbacks(
        (search_artist, item.artist),
        (search_title, item.title),
    )
    log.debug("Item search terms: {} - {}", _search_artist, _search_title)

    # Get and evaluate candidate metadata.
    for track_info in metadata_plugins.item_candidates(
        item,
        _search_artist,
        _search_title,
    ):
        dist = track_distance(item, track_info, incl_artist=True)
        candidates[track_info.track_id] = hooks.TrackMatch(dist, track_info)

    # Sort by distance and return with recommendation.
    log.debug("Found {} candidates.", len(candidates))
    candidates_sorted = _sort_candidates(candidates.values())
    rec = _recommendation(candidates_sorted)
    return Proposal(candidates_sorted, rec)

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

beets/autotag/match.py

codecov · 2025-10-21T10:05:16Z

Codecov Report

❌ Patch coverage is 78.94737% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.47%. Comparing base (88ca0ce) to head (33d1a35).
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
beets/autotag/match.py	86.66%	0 Missing and 2 partials ⚠️
beets/metadata_plugins.py	50.00%	2 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #6117   +/-   ##
=======================================
  Coverage   67.46%   67.47%           
=======================================
  Files         136      136           
  Lines       18532    18534    +2     
  Branches     3130     3130           
=======================================
+ Hits        12503    12506    +3     
  Misses       5364     5364           
+ Partials      665      664    -1

Files with missing lines	Coverage Δ
beets/autotag/match.py	`77.84% <86.66%> (+0.92%)`	⬆️
beets/metadata_plugins.py	`76.52% <50.00%> (ø)`

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

snejus · 2025-10-22T23:49:57Z

beets/autotag/match.py

    )


+def _parse_search_terms_with_fallbacks(


Seeing that you still want to use this helper method after reading my comments, I'm now concerned about my ability to understand the code.

I honestly have an issue trying to understand what this function does. I failed to do this by reading the code, so I had to manually run search in order to see what it does.

I've got a simple mp3 track:

$ tag track.mp3 IDv2 tag info for track.mp3 TALB=album TIT2=title

Previously

import artist search album/title search resulting search terms

album None None - album

album "Artist" "" - album

album "" "Album" - album

album "Artist" "Album" Artist - Album

singleton None None - title

singleton "Artist" "" Artist - title

singleton "" "Title" - Title

singleton "Artist" "Title" Artist - Title

I'm actually surprised to see that album search requires both terms. Apparently, I misunderstood even the simple if not (search_artist and search_album) condition 😁.

Now

import artist search album/title search resulting search terms

album None None - album

album "Artist" "" Artist -

album "" "Album" - Album

album "Artist" "Album" Artist - Album

singleton None None - title

singleton "Artist" "" Artist -

singleton "" "Title" - Title

singleton "Artist" "Title" Artist - Title

I take back what I said regarding 'we're keeping the previous functionality'. What you've got here aligns with my expectation: we only need one of the search terms to be present for the search to override item data.

Yeah I also had issue with understanding the logic especially since it was different in singletons and albums. Have you seen the test I added?

I did, see the rest of the comments

beets/autotag/match.py

snejus · 2025-10-23T08:41:07Z

beets/autotag/match.py

+        _search_artist, _search_album = _parse_search_terms_with_fallbacks(
+            (search_artist, cur_artist),
+            (search_album, cur_album),
+        )


Based on my comment above, we don't need to handle Nones any more, so this can be simplified to

if not search_artist and not search_album: search_artist, search_album = cur_artist, cur_album

beets/autotag/match.py

beets/metadata_plugins.py

beets/autotag/match.py

semohr · 2025-11-17T10:07:04Z

@snejus How do we want to continue here?

We could remove the _parse_search_terms_with_fallbacks function and revert to an inline approach, but I’m a bit concerned that an inline solution might drift out of sync again over time. I’d also prefer to keep the new tests, since they effectively serve as a contract for the search-parsing behavior.

That said, in the grand scheme I don't care too much as long as the tests remain in place in some form.

snejus · 2025-11-17T11:08:10Z

I think I'm missing something... Is this condition so complex it needs testing? #6117 (comment)

semohr · 2025-11-17T11:23:44Z

I mean it kinda is, otherwise we would not have had this issue here (see also). It is not really about the complexity but ensuring the implementation is well defined and stays as expected, we would use the tests here to ensure the functional requirements.

snejus · 2025-11-17T12:26:34Z

Remember the reason I did the investigation behind that comment: after your adjustment, I could not any more understand what this code does. The test does not help.

On the positive side, I'm happy that you drew attention to it, because it does not work the way I'd expect it to, and thus we need to adjust it:

-        if not (search_artist and search_name):
+        if not search_artist and not search_album:
            search_artist, search_name = cur_artist, cur_album

...

-    search_artist = search_artist or item.artist
-    search_title = search_title or item.title
+    if not search_artist and not search_title:
+        search_artist, search_title = item.artist, item.title

I think we are on the same page regarding this simple conditional being unambigous and not needing testing?

I would be keen to keep it simple and stupid and finally get it merged.

semohr · 2025-11-17T13:50:14Z

The simple and stupid approach has introduced this issue and has allowed the desync to be introduced in the first place. I would like this a bit more controlled, either have the implementations coupled with a private function and/or have this properly tested. As you are against a private function to couple the item and album logic (which tbh I don't understand, as we pretty much do the same for the _sort_candidate function), we should at-least enforce the behavior by testing it.

semohr requested a review from a team as a code owner October 21, 2025 09:53

Copilot AI review requested due to automatic review settings October 21, 2025 09:53

Copilot AI reviewed Oct 21, 2025

View reviewed changes

beets/autotag/match.py Outdated Show resolved Hide resolved

sourcery-ai bot reviewed Oct 21, 2025

View reviewed changes

beets/autotag/match.py Show resolved Hide resolved

semohr mentioned this pull request Oct 21, 2025

Empty metadata support for autotagger plugins #6065

Open

semohr force-pushed the typing-candidates-and-item-candidates branch from 8f07788 to 7a250ef Compare October 21, 2025 10:13

snejus requested changes Oct 23, 2025

View reviewed changes

semohr force-pushed the typing-candidates-and-item-candidates branch 2 times, most recently from 8bd5636 to c2f7b4d Compare October 29, 2025 13:51

semohr added 4 commits November 17, 2025 10:24

Unified search string construction between albums and items.

46e2dc9

Adjusted changelog, added suggestions by copilot

7401e7e

Added testcase.

d8c44ad

Removed trailing comma

9ba0a38

semohr force-pushed the typing-candidates-and-item-candidates branch from c2f7b4d to 96d8432 Compare November 17, 2025 09:27

search query are now strigns, removed None type

24a1d93

semohr force-pushed the typing-candidates-and-item-candidates branch from 96d8432 to 24a1d93 Compare November 17, 2025 09:42

semohr added 2 commits November 17, 2025 10:49

Simplified search_ids logic: now uses sequence typing

9a03789

Added note on title and artist usage.

33d1a35

import	artist search	album/title search	resulting search terms
album	`None`	`None`	`- album`
album	`"Artist"`	`""`	`- album`
album	`""`	`"Album"`	`- album`
album	`"Artist"`	`"Album"`	`Artist - Album`
singleton	`None`	`None`	`- title`
singleton	`"Artist"`	`""`	`Artist - title`
singleton	`""`	`"Title"`	`- Title`
singleton	`"Artist"`	`"Title"`	`Artist - Title`

Unified search string construction between albums and items. #6117

Are you sure you want to change the base?

Unified search string construction between albums and items. #6117

Conversation

semohr commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

snejus Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

snejus Oct 23, 2025

Choose a reason for hiding this comment

Previously

Now

Uh oh!

semohr Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

snejus Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

snejus Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

semohr commented Nov 17, 2025

Uh oh!

snejus commented Nov 17, 2025

Uh oh!

semohr commented Nov 17, 2025

Uh oh!

snejus commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

semohr commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

semohr commented Oct 21, 2025 •

edited

Loading

codecov bot commented Oct 21, 2025 •

edited

Loading

snejus commented Nov 17, 2025 •

edited

Loading